AITopics | massive sparse dataset

Collaborating Authors

massive sparse dataset

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Dimensionality Reduction of Massive Sparse Datasets Using Coresets

Dan Feldman, Mikhail Volkov, Daniela Rus

Neural Information Processing SystemsApr-21-2026, 22:11:14 GMT

In this paper we present a practical solution with performance guarantees to the problem of dimensionality reduction for very large scale sparse matrices. We show applications of our approach to computing the Principle Component Analysis (PCA) of any n dmatrix, using one pass over the stream of its rows. Our solution uses coresets: a scaled subset of the n rows that approximates their sum of squared distances to every k-dimensional affine subspace. An open theoretical problem has been to compute such a coreset that is independent of both n and d. An open practical problem has been to compute a non-trivial approximation to the PCA of very large but sparse databases such as the Wikipedia document-term matrix in a reasonable time. We answer both of these questions affirmatively. Our main technical result is a new framework for deterministic coreset constructions based on a reduction to the problem of counting items in a stream.

artificial intelligence, coreset, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.15)
Asia > Middle East > Israel (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.62)

Add feedback

Dimensionality Reduction of Massive Sparse Datasets Using Coresets

Neural Information Processing SystemsNov-21-2025, 15:17:37 GMT

In this paper we present a practical solution with performance guarantees to the problem of dimensionality reduction for very large scale sparse matrices. We show applications of our approach to computing the Principle Component Analysis (PCA) of any $n\times d$ matrix, using one pass over the stream of its rows. Our solution uses coresets: a scaled subset of the $n$ rows that approximates their sum of squared distances to \emph{every} $k$-dimensional \emph{affine} subspace. An open theoretical problem has been to compute such a coreset that is independent of both $n$ and $d$. An open practical problem has been to compute a non-trivial approximation to the PCA of very large but sparse databases such as the Wikipedia document-term matrix in a reasonable time. We answer both of these questions affirmatively. Our main technical result is a new framework for deterministic coreset constructions based on a reduction to the problem of counting items in a stream.

dimensionality reduction, massive sparse dataset, name change, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.44)

Add feedback

Reviews: Dimensionality Reduction of Massive Sparse Datasets Using Coresets

Neural Information Processing SystemsJan-20-2025, 18:43:22 GMT

This paper makes some pretty critical mistakes regarding previous work. For one, they cite [8], but they should in fact be citing Cohen et al. "Dimensionality Reduction for k-Means Clustering and Low Rank Approximation" This is not just a typo - the authors go on to state a result of [8] about operator norm rather than the result of the Cohen et al. paper - namely, the Cohen et al. paper achieves O(k/eps 2) rescaled columns deterministically for exactly the same problem considered in this submission - see part 5 of Lemma 11 and section 7.3 based on BSS. This is much stronger than the O(k 2/eps 2) rescaled columns achieved in the submission. This directly contradicts their sentence "Our main result is the first algorithm for computing an (k,eps)-coreset C of size independent of both n and d". The authors also say later [8,7] minimize the 2-norm - [8] is the wrong reference again!

coreset, dimensionality reduction, massive sparse dataset, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.63)

Add feedback

Dimensionality Reduction of Massive Sparse Datasets Using Coresets

Feldman, Dan, Volkov, Mikhail, Rus, Daniela

Neural Information Processing SystemsFeb-14-2020, 12:26:55 GMT

In this paper we present a practical solution with performance guarantees to the problem of dimensionality reduction for very large scale sparse matrices. We show applications of our approach to computing the Principle Component Analysis (PCA) of any $n\times d$ matrix, using one pass over the stream of its rows. Our solution uses coresets: a scaled subset of the $n$ rows that approximates their sum of squared distances to \emph{every} $k$-dimensional \emph{affine} subspace. An open theoretical problem has been to compute such a coreset that is independent of both $n$ and $d$. An open practical problem has been to compute a non-trivial approximation to the PCA of very large but sparse databases such as the Wikipedia document-term matrix in a reasonable time.

coreset, dimensionality reduction, massive sparse dataset, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Data Science > Data Mining (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.66)

Add feedback